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1.  Introduction 

This  report  describes  research  and  support  work  performed  by  the  University  of  Central  Florida 
(UCF)  and  Institute  for  Simulation  and  Training  (1ST)  in  support  of  the  OPCODE  Project  funded 
by  the  Defense  University  Research  Instrumentation  Program  (DURIP). 

The  objective  of  the  OPCODE  project  was  to  construct  a  large  Beowulf-type  computing  cluster 
to  support  AFOSR  and  other  DoD  projects,  as  well  as  to  support  educational  objectives  in  the 
areas  of  parallel  and  distributed  computing.  The  original  proposed  cluster  was  to  be  composed 
of  192  Athlon  800  MHz  processors,  supported  by  256  MB  of  RAM  and  10  GB  of  disk  storage 
per  node.  Due  to  AMD  donations  and  price  drops,  as  well  as  the  choice  to  use  a  dual-processor 
motherboard  design,  we  were  able  to  purchase  sufficient  parts  to  build  two  192  processor 
clusters,  using  AMD  1500+  processors  running  at  1.4  Ghz. 


1.1  Background 

Beowulf  is  a  multi-computer  architecture  used  for  parallel  computations.  It  is  a  system  that 
usually  consists  of  one  server  node,  and  one  or  more  client  nodes  connected  together  via  Ethernet 
or  some  other  network.  It  is  a  system  built  using  commodity  hardware  components,  such  as  any 
PC  that  is  capable  of  running  Linux,  standard  Ethernet  adapters,  and  switches.  It  does  not  contain 
any  custom  hardware  components  and  it  is  trivially  reproducible.  Beowulf  also  uses  commodity 
software  like  Parallel  Virtual  Machine  (PVM)  or  Message  Passing  Interface  (MPI).  The  server 
node  controls  the  whole  cluster  and  serves  files  to  the  client  nodes.  It  is  also  the  cluster's  console 
and  the  gateway  to  the  outside  world.  Client  nodes  in  a  Beowulf  system  are  dumb:  They  are 
configured  and  controlled  by  the  server  node,  and  do  only  what  they  are  told  to 


Beowulf  systems  have  been  constructed  from  a  variety  of  parts.  For  the  sake  of  performance 
some  non-commodity  components  (i.e.  produced  by  a  single  manufacturer)  occasionally  have 
been  employed  in  a  few  implementations. 


A  CLASS  I  Beowulf  cluster  is  built  entirely  from  commodity  "off-the-shelf  parts.  The 
advantages  of  a  CLASS  I  system  are: 

•  hardware  is  available  form  multiple  sources  (low  prices,  easy  maintenance) 

•  no  reliance  on  a  single  hardware  vendor 

•  free  driver  support  from  Linux  community 

•  hardware  components  based  on  standards  (SCSI,  Ethernet,  etc.) 
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In  the  taxonomy  of  parallel  computers,  Beowulf  clusters  fall  somewhere  between  MPP 
(Massively  Parallel  Processors,  like  the  Convex  SPP,  Cray  T3D,  Cray  T3E,  CMS,  etc.)  and 
NOWs  (Networks  of  Workstations).  A  Beowulf  cluster  benefits  from  developments  in  both  these 
classes  of  architecture.  MPP's  are  typically  larger  and  have  a  lower  latency  interconnect  network 
than  Beowulf  clusters.  Using  MPPs,  programmers  are  required  to  consider  locality,  load 
balancing,  granularity,  and  communication  overheads.  Most  programs  for  MPPs  are  developed  in 
message  passing  style.  Such  programs  can  be  readily  ported  to  Beowulf  clusters. 


Programming  a  NOW  is  usually  an  attempt  to  harvest  unused  cycles  on  an  already  installed  base 
of  workstations  in  a  lab  or  on  a  campus.  Programming  in  this  environment  requires  algorithms 
that  are  extremely  tolerant  of  load  balancing  problems  and  large  communication  latency.  These 
programs  will  directly  run  on  a  Beowulf.  A  Beowulf  class  cluster  computer  is  distinguished 
from  a  NOW  by  several  subtle  but  significant  characteristics.  First,  the  nodes  in  the  cluster  are 
dedicated  to  the  cluster.  This  helps  ease  the  load  balancing  problem,  because  the  performance  of 
individual  nodes  is  not  subject  to  external  factors.  Also,  since  the  interconnection  network  is 
isolated  from  the  external  network,  the  network  load  is  determined  only  by  the  application  being 
run  on  the  cluster.  This  eases  the  problems  associated  with  unpredictable  latency  in  NOWs.  All 
the  nodes  in  the  cluster  are  within  the  administrative  jurisdiction  of  the  cluster.  For  example,  the 
Beowulf  software  provides  a  global  process  ID  that  enables  a  mechanism  for  a  process  to  send 
signals  to  a  process  on  another  node  of  the  system.  This  is  not  allowed  on  a  NOWs.  Finally, 
operating  system  parameters  can  be  tuned  to  improve  performance.  For  example,  a  workstation 
should  be  tuned  to  provide  the  interactive  feel  (instantaneous  responses,  short  buffers,  etc),  but  in 
cluster  the  node  can  be  tuned  to  provide  better  throughput  for  coarser  grain  jobs  because  they  are 
not  interacting  with  users. 


In  1999,  1ST  built  a  CLASS  I  Beowulf  cluster,  consisting  of  17  nodes  and  34  Pentium  II 
processors,  with  256  Mb  of  RAM  and  8.6  Gb  of  local  storage  per  node.  Cluster  communications 
take  place  over  a  three  channel-bonded  fast  ethemet  network,  using  low-latency  switches  to 
minimize  network  traffic.  An  IEEE  1394  standard  alternate  network  has  also  been  installed  to 
offer  lower  latency,  improved  bandwidth,  and  deterministic  commimications. 

In  2000,  a  128  node  cluster  was  constructed,  located  in  the  School  of  Electrical  Engineering  and 
Computer  Science  (SEECS)  at  UCF.  This  cluster,  “SCEROLA”,  consists  of  128  900  Mhz 
Athlon  processors,  with  32  Gb  total  RAM  and  1.6  Terabytes  total  secondary  storage. 


2  Technical  Approach 

Based  on  our  experience  in  building  previous  clusters,  our  approach  was  to  custom-design  a 
cluster  based  on  the  most  currently  available  commodity  hardware  that  yielded  the  best 
cost/performance  ratio.  We  investigated  the  cost/performance  ratio  of  “prebuilt”  Beowulf 
clusters  that  are  available  commercially,  and  determined  that  we  could  build  a  cluster  with 
between  2-3  times  increase  in  the  performance  by  designing  and  building  our  own  cluster  fi'om 
individual  components.  As  an  additional  benefit,  we  also  viewed  the  exercise  of  cluster  design 


2 


AFOSR  DURIP  OPCODE  Final  Report 


Page  3 


and  construction  as  a  worthy  academic  and  educational  experience.  The  tradeoff  made  for  these 
benefits  was  that  financial  support  for  the  considerable  labor  involved  in  the  design,  construction 
and  installation  of  the  cluster  was  not  funded  by  the  DURIP  program,  and  had  to  be  supported  by 
other  means.  Fortunately,  some  labor  support  was  provided  by  the  Advanced  Tactical 
Engagement  Simulation  Science  and  Technology  (A-TES  STO)  program  funded  by  U.S.  Army 
STRICOM,  as  was  mentioned  in  the  original  proposal  to  AFOSR.  Other  financial  support  for 
labor  was  derived  from  UCF  matching  funds.  Ultimately,  however,  the  lack  of  sufficient  labor 
funding  has  resulted  in  delays  in  construction  that  were  not  fully  anticipated  in  the  original 
schedule. 

2.1  Current  Status 

The  OPCODE  project  has  resulted  in  purchase  of  components  for  the  construction  of  two  192- 
processor  computing  clusters,  one  located  at  1ST  (OPCODE  I)  and  the  other  at  SEECS  UCF 
(OPCODE  II).  OPCODE  I,  shown  in  Figure  1,  is  completed  and  has  been  benchmarked  using 
SCALAPACK  at  86.5  GFLOPS  using  144  nodes  of  the  total  192  compute  nodes  (See  Appendix 
B  for  results).  The  OPCODE  II  cluster  is  currently  under  construction,  and  is  about  half 
complete.  These  clusters  employ  a  fast  Ethernet  network,  linked  together  over  a  high-speed, 
fully  stacked  switch.  Each  node  consists  dual  AMD  1500+  processors  on  a  Tyan  “Tiger”  K7 
motherboard,  with  1  GB  of  DDR  RAM,  and  20  GB  local  disk  storage  per  node.  The  compute 
nodes  are  rack  mounted  with  two  nodes  per  tray  (see  Figure  2).  An  alternative  IEEE  1394 
network  is  also  be  available  on  44of  the  nodes,  on  each  machine.  We  have  contributed 
extensively  to  developing  IEEE  1394  drivers  for  Linux,  and  one  of  our  patches  is  included  in  the 
Linux  2.4.x  kernel. 
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Figure  1  OPCODE  I  cluster  at  1ST 
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Figure  2  -  Two  compute  nodes  from  OPCODE  I  showing  tray  for  rack-mounting. 

2.2  Chronology  of  events  and  design  decisions 
01-2001  -  Notification  of  award 

02-2001  -  Meetings  held  with  UCF  Office  of  Research  to  determine  level  of  UCF  Matching 
funds.  Matching  funds  totaling  $54,  250  were  awarded  under  the  Florida  State  High-Tech 
Corridor  1-4  Program.  Additional  funding  from  the  UCF  Presidential  Equipment  Program  was 
not  made  available. 

03-2001  -  It  was  decided  that  two  clusters  would  be  built,  and  the  decision  was  made  to  split  the 
award  with  half  administered  by  1ST  and  the  other  half  by  the  UCF  School  of  Electrical  and 
Computer  Engineering  (SEECS). 

04-2001  -  Funds  awarded  and  accounts  provided  for  both  AFOSR  and  UCF  matching  funds. 
Due  to  UCF  funding  procedures,  nearly  all  of  the  matching  funds  were  deferred  imtil  after  the 
end  of  the  UCF  fiscal  year  (ending  06-2001) 

05-2001  -  A  location  for  the  1ST  cluster,  OPCODE  I,  was  determined,  and  cooling  requirements 
were  estimated.  Quotes  for  an  air  conditioning  upgrade  with  a  6.5  ton  capacity  were  obtained. 
The  online  Beowulf  site  at  http://www.phv.duke.edu/brahma/beowulf  online  book/node43.html 

was  helpful  in  determining  the  cooling  requirements. 
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06-2001  -  Additional  quotes  for  the  power  and  air  conditioning  were  obtained,  allowing  for  300 
Amps  of  power.  Quotes  were  obtained  for  basic  items  that  were  not  particular  to  the  final  design 
decisions,  such  as  for  RAM  and  NICs. 

07-2001  -  Performance  comparisons  between  the  currently  available  AMD  processors  and  the 
Intel  P4  were  considered.  A  decision  was  made  to  use  the  AMD  procesors  in  the  cluster  design, 
based  on  the  following  considerations:  l)The  ability  of  the  AMD  processors  to  perform  3 
floating  point  operations  per  clock  cycle  (vs.  2  floating  point  operations  for  the  P4),  2)  The 
superior  cost/performance  ratio  offered  by  the  AMD  processors,  and  3)  The  high  cost  of  Rambus 
memory  modules  required  by  the  P4.  Two  useful  sources  of  information  that  were  considered 
follow; 


http  ://arstechni  ca.  com/  cpu/0 1  a2/D4andg4e/p4andg4e- 1  .html 
http://www.emulators.com/pentium4.htm 

08-2001  Elitegroup  K755A  motherboards  with  SiS  736  chipsets  were  quoted  for  $  65.00.  These 
boards  come  with  onboard  Ethernet,  but  since  the  Ethernet  capability  on  these  boards  was  based 
on  the  RealTek  chip  (RTL8201L-Chip)  they  were  ruled  out,  as  they  are  slower  then  NICs  based 
on  the  Tulip  or  Intel  chipsets  because  they  do  not  do  dma  on  receive,  resulting  in  an  extra  mem 
copy. 

The  HP  procurve  4100gl  switch  was  investigated  due  to  its  low  cost,  although  it  does  not  allow 
for  channel  bonding. 

09-2001  -  A  system  based  on  the  dual  processor  Tyan  Tiger  motherboards  using  Athlon  1.2  Ghz 
Palomino  processors  with  1  GB  of  PC2100  DDR  memory  was  purchased  for  testing  purposes.  A 
design  based  on  these  dual-processor  motherboards  was  chosen.  Power  requirements  were  tested 
by  measuring  current  draw  under  a  variety  of  conditions,  as  shown  below: 

Situation  Current  (Amps') 


Not  on  0.05 

powerup  1-20 

post  1-15 

fsckinlinux  1-11 

Linux  on  but  idle  1  08 

distributed.net  rc5 -64  1.33 

rc5-64  &  updatedb  1.36 

cpubum  1.18 

rc5-64  &  cpubum  1.29 


It  was  noted  that  these  figures  were  lower  than  expected  with  regards  to  cpubum  vs.  rc5-64,  the 
reason  being  that  rc5-64  is  multi-threaded  and  so  was  using  both  processors,  while  cpubum  was 
single  threaded  so  did  not  use  the  second  processor. 
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System  stability  was  observed  to  be  good. 


09-2001  Carrie  Hermann,  Manager  of  AMD  University  Relations,  was  contacted  regarding 
possible  processor  donations.  A  proposal  was  submitted  to  AMD  requesting  192  Athlon  4 
“Palamino”  processors.  This  proposal  is  attached  as  Appendix  C 

10-2001  -  AMD  proposal  denied,  but  referral  is  made  to  AMD  University  Parts  Program.  Racks 
purchased  for  OPCODE  I  cluster. 


11-2001  -  Performance  of  the  raid  arrays  that  will  be  used  for  the  servers  in  the  new  cluster  is 
tested.  Performance  was  tested  for  both  software  and  hardware  raid  levels  0  and  5.  We  also 
tested  the  performance  with  different  file  systems.  Best  performance  is  noted  using  software  raid 
and  XFS  file  systems. 

The  power  and  air  conditioning  for  OPCODE  I  at  1ST  are  completed.  With  a  minimal  heat  load 
in  the  room  the  temperature  stays  at  a  constant  60F.  Purchased  5  racks  and  some  networking 
supplies  to  be  used  in  OPCODE  II. 


12-2001  -  A  decision  was  made  to  purchase  the  Extreme  Networks  Alpine  3804  switch  instead 
of  the  HP  Procurve  Swich  that  had  previously  been  considered.  This  decision  was  based  on 
information  that  ProCurve  4108GL  will  not  do  wire  speed  to  all  ports.  It  is  rated  at  36.6  Gbps 
fabric  speed,  compared  with  38.4  Gbps  with  all  of  the  ports  used  at  lOOTX.  Now  granted  this 
number  does  not  apply  to  us  because  we  will  not  be  using  all  of  the  ports.  Customer  reports 
indicated  that  the  previous  HP  Procurve  model  would  only  perform  at  about  2/3  of  its  rated 
speed.  HP  specifies  that  the  switch  has  a  latency  of  <10us  (FIFO)  and  is  able  to  have  up  to  6 
trunks  with  4  ports  in  each  trunk.  Reports  on  the  Beowulf.org  list  indicated  that  some  people  had 
a  hard  time  getting  trunking  (bonding)  to  work  on  the  previous  model  Procurve  switch. 

The  Extreme  Networks  Alpine  3804  chassis  is  rated  at  32  Gbps  which  is  more  then  can  be 
plugged  into  128  10/100  ports  the  chassis  can  handle  (Alpine  3808  can  handle  64  Gbps  with  256 
ports).  There  have  been  independent  tests  that  confirm  that  the  switch  will  do  wire  speed  to  all 
ports  with  no  blocking.  Customer  reported  <5us  latency  for  the  switch,  although  the 
specification  is  not  given  by  Extreme  Networks.  The  Alpine  switch  can  have  as  many  trunks  on 
the  switch  as  needed,  with  at  least  4  ports  per  trunk.  An  Extreme  Networking  representative 
informed  us  that  they  have  an  upgrade  process,  allowing  for  an  possible  future  upgrade  to 
channel  bonding  to  all  of  the  nodes  by  replacing  the  chassis  and  purchasing  a  couple  of  ethemet 
modules.  The  smaller  chassis  we  chose  does  not  allow  for  channel  bonding  to  every  node. 


01-2002  -  Purchased  the  Alpine  switches.  Received  donation  from  AMD  of  60  AMD  1500+ 
processors,  and  took  bids  on  the  remaining  176  AMD  1500+  processors  required  for  each  cluster. 

02-2002  The  motherboards  and  processors  were  ordered.  The  racks  being  used  for  the 
ISTOPCODE  cluster  were  painted  and  supplies  were  purchased  for  making  the  rear  panels.  One 
of  the  Alpine  Extreme  switches  was  tested  by  using  it  temporarily  in  the  SEECS  SCEROLA 
cluster. 
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03-2002  We  finished  purchasing  most  of  the  components  for  the  OPCODE  clusters.  There  are 
still  a  couple  of  items  that  need  to  be  purchased  on  some  of  the  matching  fimds  accounts.  Wiring 
the  network  of  the  cluster  was  begun.  Custom  cables  were  made  for  use  between  the  patch  panel 
and  the  switch,  and  cables  were  run  from  the  patch  panel  to  each  of  the  computer  trays.  The 
trays  that  will  be  holding  the  motherboards  were  designed. 

04-2002  Finished  wiring  the  switch  to  the  patch  panel  and  started  to  configure  the  switch  for  use 
on  the  OPCODE  I  cluster  and  1ST  network.  Two  vlans  were  setup  on  the  switch,  one  for  the 
cluster  network  and  one  for  the  1ST  network.  Some  computers  are  to  be  on  both  vlans  so 
modifications  of  the  network  stacks  on  those  machines  were  necessary  to  allow  for  the  larger 
packets  that  tagged  vlans  impose.  The  design  for  the  trays  that  the  motherboards  will  moimt  on 
was  finalized  and  has  been  sent  of  to  the  manufacture  so  that  a  prototype  can  be  made.  The 
frontend  machine  for  the  OPCODE  I  cluster  has  been  setup  and  is  being  configured. 


05-2002  -  Present 

The  OPCODE  I  cluster  is  completed  and  running.  SCALPACK  benchmarks  give  a  top 
performance  rating  of  approximately  86.5  GFLOPS  running  144  of  the  nodes,  as  shown  in 
Appendix  B.  OPCODE  II  is  under  construction,  with  about  half  of  the  trays  finished  at  present. 
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Appendix  A  -  Financial  Report 


Table  I  -  Original  Submitted  Budget 

(Note  -  Additional  UCF  Match  ultimately  was  not  obtained,  by  decision  of  UCF  Office 
of  Research) 


"OPCODE  (ORLANDO  PARALLEL  COMPUTATION  DEVELOPMENT  ENVIRONMENT)" 


DIRECT  COSTS 

SPONSOR 

COST 

UCF 

MATCH  (1) 

TOTAL 

BUDGET 

ADDITIONAL 

UCF 

MATCH  (2) 

LABOR 

$ 

$ 

$ 

$ 

EQUIPMENT 

$  217,000 

$  54,250.00 

$  271,250 

$  217,000 

TRAVEL 

$ 

$ 

$ 

$ 

MISCELLANEOUS  MATERIALS  AND  SUPPLIES 

$ 

$ 

$ 

$ 

TOTAL  DIRECT  COSTS 

$  217,000 

$  54,250 

$  271,250 

$  217,000 

INDIRECT  COST 

(Total  Direct  Costs  Less  Equipment  x  42.5%) 

$ 

$ 

$ 

$ 

TOTAL  COST 

$  217,000 

$  54,250 

$  271,250 

$  217,000 

NOTES: 

(1)  "UCF  MATCH"  column  represents  matching  funds  to  be  committed  to  this  funding. 

(2)  "ADDITIONAL  UCF  MATCH"  column  represents  matching  funds  that  may  be  offered  pending  University  approval. 
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Table  II  -  Original  Proposed  Equipment  List  (prices  obtained  08/16/2000) 


Item 

- A - 

Description 

Source 

Cost 

El^l 

Total  Cost 

ASUS  K7V 
VIA  KX133 

5  PCI,  No  Audio,  - 
IGhz 

www.necxdirect.  com 

131.95 

48"xl8"x72’' 
black  wire¬ 
frame  shelves 

Local  Hardware  Store 
(Lowe's  or  equivalent) 

64.00 

12 

768 

AMD  Athlon 
Thunderbird 

800Mhz  Socket  A 

WWW .  motherboards .  com 

196.00 

192 

37632 

Hawking  15' 
Cat.5e  cable  5- 
packs 

15'  transparent 
color-coded 

Buy.  Com 

12.00 

80 

960 

Quantum 
10.2GB  HIDE 

Ultra 

DMA66/33,7200 

RPM 

McGlen, 

www.mcglen.com 

101.72 

192 

19530.24 

SIIG 

IEEE1394  3- 
port 

www.necxdirect.com 

49.95 

192 

9590.4 

30GB 

FIREWIRE 

EXTERNAL 

HD 

5200RPM,  16.6MB 
per  second  data 
transfer 

www.necxdirect.com 

369.95 

PC133 

SDRAM  8NS 
256MB 

www.mwave.com 

305.00 

384 

117120 

Misc 

3000 

SUPERSTAC 

K II  SWITCH 
3300  MM 

www.3com.com/pro 
ducts/ switches/  super 
stack/ss2  3300mm 

McGlen, 

www.mcglen.com 

1,625.62 

24 

39014.88 

Smartiink 
lOOMb/s  NIC 
10-packs 

www.mwave.com 

www.mwave.com 

80.00 

60 

4800 

MID  ATX  6- 
BAY  230W 
+FLOPPY 

www.necxdirect.com 

67.95 

192 

13046.4 

TOTAL 

271166.27 
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Table  III  -  OPCODE  I  Machine  Components  and  Budget 


Nodes 

96 

Spares 

2 

Total  Nodes 

98 

Servers 

4 

Frontend 

1 

Total 

103 

Cluster  Node 

Item 

Vendor 

Description 

Count 

Cost 

Processor 

D&H 

Athlon  MP  1500+  1  year  Warranty 

2 

$  159.5 

Fan 

Monarch 

CoolerMaster  DP5-I1 1  A-Al 

2 

$  7.( 

Motherboard 

MicroPro 

TYAN  s2460  AMD  760MP  266/200  FSB  4  DIMMS  DDR  1/5 

1 

$  175.( 

Memory 

Muskin 

Muskin  PC2100  256  MB  CL  2.5  ECC  Registered 

2 

$  42.( 

NIC 

AMASTORE 

Netgear  FAS  lOTX 

1 

$  lO.f 

Hard  drive 

TC 

Maxtor  30GB  ATA  100  5400RPM  2  MB 

1 

$  85.( 

power  supply 

Axion  Tech 

Dynapower  DP350A  350  Watt  12V  atx  power  supply 

Node  Total 

Server  Node 

Item 

Vendor 

Description 

Count 

Cost 

Processor 

D&H 

Athlon  MP  1500+  1  year  Warranty 

2 

$  159.5 

Fan 

Monarch 

CoolerMaster  DP5-^I1 1  A-Al 

2 

$  7.( 

Motherboard 

Sybercom 

TYAN  s2460  AMD  760MP  266/200  FSB  4  DIMMS  DDR  1/5 

1 

$  175.( 

Memory 

Muskin 

Muskin  PC2100  256  MB  CL  2.5  ECC  Registered 

4 

$  42.( 
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power  supply 

Axion  Tech 

server  monitor 

server  hardrives 

Hyper  Micro 

server  raid 

Hyper  Micro 

server  case 

D-Link  DFE-570TX  4  port  Ethernet  card _ 

Dynapower  DP400  400  Watt  12V  atx  power  supply 


Maxtor  81GB  ATA  100  5400RPM  2  MB _ 

3ware  4-port  ata/66  RAID  5  controller  (6410) 


Item 

Vendor 

switch 

Extreme  Networks 

switch 

Extreme  Networks 

switch 

Extreme  Networks 

switch 

Extreme  Networks 

switch 

Extreme  Networks 

Description 


Networkin 

Item  Vendor  |  Description _ _ 


networking _ Sofisticated _ catS  2xRJ45  surface  mount  box _ 

networking _ Skycraft _ 48  Port  cat  5  patch  panel _ 

networking _ Home  Depot _ Telemaster  RJl  I/RJ45  crimper  Ratcheting 

networking  Sofisticated  Cat  5  RJ45  jacks  solid  and  stranded  lOOOpiece 


networking _ PCTek  Online _ CatSe  boots _ _ 

networking  PCTek  Online  1000'  Cat5e  Solid  Cable  350Mhz  Yellow _ 


$  109.( 
$  62.( 


$  3,385.1 


$  1,116.1 


$  1,396.1 
$  1,322.( 


Count  Cost 
114 

J _ 

1 


J _ 

500  $ _ OJ 

4  I  $  44.( 

Total 


$  59.( 
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Networking  Tota 

Rack  Components 

Item 

Vendor 

Description 

Count 

Cost 

Chassis 

Turf  Equipment 

Motherboard  Trays  w/  Power  supply  brackets 

53 

$  29.f 

Chassis 

Turf  Equipment 

Motherboard  Tray  rails 

$  3.( 

Chassis 

Turf  Equipment 

Rail  Supports 

[91 

Chassis 

Turf  Equipment 

Stop  brackets 

14 

$  3.( 

Total 

Misc 

^9 

Item 

Vendor 

Description 

Cost 

Air  Flow 

Home  Depot 

Door  bottom  molding 

1 

$  6.< 

Tools 

Home  Depot 

Versapack  Cordless  Screwdriver 

1 

$  28.( 

fasteners 

Skycraft 

500  4"  cable  ties  and  1000  6"  cable  ties 

1 

$  27.! 

fasteners 

Skycraft 

Standoffs 

500 

$  0.( 

fasteners 

Skycraft 

computer  screws  #6-32  1/4 

3000 

$  0.( 

Fans 

Home  Depot 

Lasko  20"  box  fan 

7 

$  lO.S 

Fans 

Skycraft 

19"  chassis  fan 

1 

o 

p 

Air  Flow 

Home  Depot 

Hardboard 

5 

$  5.! 

Paint 

Home  Depot 

Rustoleum  Hard  Hat  Spray  Paint  (Black) 

10 

$  4.‘ 

Power 

Home  Depot 

Belkin  7  outlet  15A  12ft  cord  752J  $20000  surge  protector 

18 

od 

UPS 

Warehouse 

Tripp  Lite  Smart  Pro  Net  2200VA 

1 

$  715.( 

UPS 

Warehouse 

Tripp  Lite  Smart  Pro  1400XL 

2 

$  559.( 

Total 

KVM 
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Item 

Vendor 

Description 

Count 

Cost 

KVM 

Micro  Warehouse 

Belkin  OMNIVIEW  Matrix2  2x8  port  KVM 

1 

C^ 

00 

KVM 

Micro  Warehouse 

OMNIVIEW  CATS  Extender 

2 

$  222.( 

KVM 

Micro  Warehouse 

25ft  Matrix  Cable  PS2 

4 

$  41.( 

KVM 

Micro  Warehouse 

6ft  Matrix  Cable  PS2 

4 

$  20.( 

Total 

Shipping 

Item 

Vendor 

Description 

Count 

Cost 

shipping 

PCTek  Online 

CatSe  and  RJ45  Boots 

shipping 

Sofisticated 

cats  2xRJ45  surface  mount  box  and  RJ45  plugs 

shipping 

Muskin 

Memory 

shipping 

Hyper  Micro 

Server  harddrives  and  Raid  5 

Total 

Donations 

Item 

Vendor 

Description 

Count 

Cost 

Processors 

AMD 

Athlon  MP  1500+  1  year  Warranty 

30 

$  159.‘ 

Total 

Other  Expenses 

Item 

Vendor 

Description 

Coxmt 

Cost 

Dev  machines 

Dual  Processor  Test  and  Front  End  Support  Machines 

15 

$  900.( 

Grand  Total 
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Table  IV  OPCODE  II  Cluster  Components  (Identical  to  OPCODE  I) 


Nodes 

96 

Spares 

2 

Total  Nodes 

98 

Servers 

4 

Frontend 

1 

Total 

103 

Item _ 

Processor 


Fan _ 

Motherboard 


Memory 

NIC 


Hard  drive 
ower  supph 


Vendor _ 

D&H 


Monarch 

MicroPro 


Muskin _ 

AMASTORE 


TC _ 

Axion  Tech 


Description _ 

Athlon  MP  1500+  1  year  Warrant 


CoolerMaster  DP5-I1 1  A-Al _ 

TYAN  s2460  AMD  760MP  266/200  FSB  4  DIMMS  DDR  1/5 


Muskin  PC2100  256  MB  CL  2.5  ECC  Registered _ 

Neteear  FA310TX 


Maxtor  30GB  ATA  100  5400RPM  2  MB _ 

Dynapower  DP350A  350  Watt  12V  atx  power  suppl 


$  159.‘ 


$ _ IX 

$  175.( 


$  42.( 

$  lO.f 


$  85.{ 


$  42.: 


Item 


Processor _ 

Fan 


Motherboard 

Memory 


NIC _ 

ower  suppl 


server  monitor 
server  hardrives 


Vendor 


D&H 

Monarch 


Sybercom 

Muskin 


AMASTORE 
Axion  Tech 


Hyper  Micro 


Description _ 

Athlon  MP  1500+  1  year  Warranty _ 

CoolerMaster  DP5-^I11A-A1 _ _ 


TYAN  s2460  AMD  760MP  266/200  FSB  4  DIMMS  DDR  1/5 
Muskin  PC2100  256  MB  CL  2.5  ECC  Registered 


D-Link  DFE-570TX  4  port  Ethernet  card _ 

Dynapower  DP400  400  Watt  12V  atx  power  supply 


Maxtor  81GB  ATA  100  5400RPM  2  MB 


$  155.( 
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server  raid 


Hyper  Micro 


3ware  4-port  ata/66  RAID  5  controller  (6410) 


$  179.( 


Item 

switch 


switch 

switch 

switch 

switch 


Item 

networkinj 

networkin 


networkin 


networkin 


networkin 

networkin 


Vendor _ Description _ 

Extreme  Networks  Alpine  3804  Chassis _ 


Extreme  Networks  Alpine  SMMi  Basic  L3 _ 

Extreme  Networks  Alpine  3800  FM-32Ti  (32port  lO/lOOTx  module) 

Extreme  Networks  Alpine  3800  PS _ 

Extreme  Networks  Alpine  3804  Service  contract  (48  Hours) _ 


Vendor _ 

Sofisticated 

Skycraft 


Home  Depot 


Sofisticated 
PCTek  Online 
PCTek  Online 


Description _ 

cat5  2xRJ45  surface  mount  box _ 

48  Port  cat  5  patch  panel _ 


Telemaster  RJl  1/RJ45  crimper  Ratchetin 
Cat  5  RJ45  jacks  solid  and  stranded  IQQOpiece 

Cat5e  boots _ 

1000’  Cat5e  Solid  Cable  350Mhz  Yellow 


$4,430.( 


$  3,385 J 
$l,116.i 
$  1,396.^ 
$  1,322.( 


Count  Cost 


_$ _ U 

$  48.( 

$  42S 


$  59.( 

$  OJ 


$  44.{ 
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Item 

Vendor 

Description 

Count 

Cost 

Chassis 

Turf  Equipment 

Motherboard  Trays  w/  Power  supply  brackets 

53 

$  29.1 

Chassis 

Turf  Equipment 

Motherboard  Tray  rails 

$  3.( 

Chassis 

Turf  Equipment 

Rail  Supports 

$  10.( 

Chassis 

Turf  Equipment 

Stop  brackets 

14 

$  3.( 

Total 

Misc 

Item 

Vendor 

Description 

Count 

Cost 

Air  Flow 

Home  Depot 

Door  bottom  molding 

1 

$  6.‘ 

Tools 

Home  Depot 

Versapack  Cordless  Screwdriver 

1 

$  28.< 

fasteners 

Skycraft 

500  4"  cable  ties  and  1000  6"  cable  ties 

S  27.1 

fasteners 

Skycraft 

Standoffs 

$  0.( 

fasteners 

Skycraft 

computer  screws  #6-32  1/4 

$  0.( 

Fans 

Home  Depot 

Lasko  20"  box  fan 

7 

$  lO.S 

Fans 

Skycraft 

19”  chassis  fan 

1 

$  100.( 

Air  Flow 

Home  Depot 

Hardboard 

5 

$  5.1 

Paint 

Home  Depot 

Rustoleum  Hard  Hat  Spray  Paint  (Black) 

10 

$  4.( 

Power 

Home  Depot 

Belkin  7  oudet  15A  12ft  cord  752J  $20000  surge  protector 

18 

$  18.‘ 

UPS 

Warehouse 

Tripp  Lite  Smart  Pro  Net  2200VA 

1 

$  715.( 

UPS 

Warehouse 

Tripp  Lite  Smart  Pro  1400XL 

2 

$  559.( 

Total 

KVM 

■i 

Item 

Vendor 

Description 

Cost 

KVM 

Micro  Warehouse 

Belkin  OMNIVIEW  Matrix2  2x8  port  KVM 

1 

$  618.( 

KVM 

Micro  Warehouse 

OMNIVIEW  CAT5  Extender 

2 

$  222.( 
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Micro  Warehouse  I  25ft  Matrix  Cable  PS2 


Micro  Warehouse  6ft  Matrix  Cable  PS2 


$  41.( 

$  20.( 


Vendor _ 

PCTek  Online 
Sofisticated 


Muskin 


Hyper  Micro 


Description _ 

Cat5e  and  RJ45  Boots 


cats  2xRJ45  surface  mount  box  and  RJ45  plugs 


Memo 


Server  harddrives  and  Raid  5 


Description 


Athlon  MP  1500+  1  year  Warran 


Count  Cost 


$  159.^ 


Item 


Dev  machines 


Vendor 


Other  Expenses 


Description  _ 


Dual  Processor  Test  and  Front  End  Support  Machines 


Count  Cost 


$  900.{ 
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Table  V  1ST  Infrastructure  Upgrades  (from  UCF  Cost-sharing  funds) 


Air  and  Power  Infrastructure  Upgrades  (1ST) 

Air  &  Power 

Lincoln 

300  Amp  feed  thru  84  circuit  panel  &  1  Service  wire 

1 

$10,070.00 

$ 

10,070.00 

Air  &  Power 

Lincoln 

6.5  ton  HVAC  unit  &  install 

1 

$  8,977.00 

$ 

8,977.00 

Air  &  Power 

Lincoln 

Construction  Management  Fee  (5%) 

1 

$  952.35 

,*52.35 

Air  &  Power 

Lincoln 

30  amp  240  volt  outlet 

1 

$  225.00 

$ 

225.00 

Air  &  Power 

Lincoln 

20  amp  120  volt  outlet 

14 

$  125.00 

$ 

1,750.00 

Total 

$ 

21,974.35 

Additional  UCF  Cost-sharing  funds  were  expended  on  labor,  and  cluster  computer 
laboratory  hardware  and  software  support. 
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Appendix  B  -  Benchmark  results  for  OPCODE  I 

HPLinpack  benchmark  input  file 

Innovative  Computing  Laboratory,  University  of  Tennessee 


HPL.out 

6 

1 

66000 
2 

60  64 
1 

12 
12 

16.0 
3 


0 

2 

2 

1 

2 

3 

0 

1 

3 

1 

0 

2 

64 

0 

0 

1 

8 


output  file  name  (if  any) 

device  out  ( 6=stdout, 7=stderr, f ile) 

#  of  problems  sizes  (N) 


Ns 


1  2 


1  2 


#  of  NBs 


NBs 


(P  X  Q) 


#  of  process  grids 
Ps 

Qs 

threshold 

#  of  panel  fact 

PFACTs  (0=left,  l=Crout,  2=Right) 

#  of  recursive  stopping  criterium 
NBMINs  (>=  1) 

#  of  panels  in  recursion 
NDIVs 

#  of  recursive  panel  fact. 

RFACTs  (0=left,  l=Crout,  2=Right) 

#  of  broadcast 

BCASTs  (0=lrg, l=lrM, 2=2rg, 3=2rM, 4=Lng, 5=LnM) 

#  of  lookahead  depth 
DEPTHS  (>=0) 

SWAP  (0=bin-exch, l=long, 2=mix) 
swapping  threshold 

LI  in  {0=transposed, l=no“transposed) 

U  in  {0=transposed, l=no-transposed) 
Equilibration  (0=no,l=yes) 
memory  alignment  in  double  (>  0) 


form 

form 


HPLinpack  1.0  —  High-Performance  Linpack  benchmark  —  September  21,  2000 

Written  by  A.  Petitet  and  R.  Clint  Whaley,  Innovative  Computing  Labs.,  UTK 


An  explanation  of  the  input/output  parameters  follows: 

T/V  :  Wall  time  /  encoded  variant. 

N  :  The  order  of  the  coefficient  matrix  A. 

NB  :  The  partitioning  blocking  factor. 

P  :  The  number  of  process  rows. 

Q  :  The  number  of  process  columns. 

Time  :  Time  in  seconds  to  solve  the  linear  system. 

Gflops  :  Rate  of  execution  for  solving  the  linear  system. 

The  following  parameter  values  will  be  used: 


N 

66000 

NB 

60  64 

P 

12 

Q 

12 

PFACT 

Left  Crout  Right 

NBMIN 

2  4 
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NDIV  :  2 

RFACT  :  Left  Grout  Right 

BCAST  :  2ringM 

DEPTH  :  0 

SWAP  :  Mix  (threshold  =  64) 

LI  :  transposed  form 

U  :  transposed  form 

EQUIL  :  yes 

ALIGN  :  8  double  precision  words 


~  The  matrix  A  is  randomly  generated  for  each  test. 

-  The  following  scaled  residual  checks  will  be  computed: 

1)  I IAx-b|  j_oo  /  (  eps  *  I |A| !_1  *  N  ) 

2)  ||Ax-b|l_oo  /  (  eps  *  I1A||_1  *  l|x||_l  ) 

3)  I lAx-bl i_oo  /  (  eps  *  i |A| |_oo  *  | |x|  j_oo  ) 

-  The  relative  machine  precision  (eps)  is  taken  to  be  1.110223e-16 

-  Computational  tests  pass  if  scaled  residuals  are  less  than  16.0 

T/V  N  NB  P  Q  Time  Gflops 

W03L2L2  66000  60  12  12  2297.85  8.341e+01 

I |Ax-bj  |_oo  /  (  eps  *  I |A| 1_1  *  N  )  -  0.0207560  PASSED 

IIAx-blCoo  /  (  eps  *  I1AM_1  *  i|xM_l  )  =  0.0084520  PASSED 

I |Ax-b| l_oo  /  (  eps  *  i iA| | _oo  *  | |xl |_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2L4  66000  60  12  12  2255.11  8.499e+01 

i|Ax-bl|_oo  /  (  eps  *  I |A| i_l  *  N  )  =  0.0207939  PASSED 

MAx-bi|_oo  /  (  eps  *  I |A| 1_1  *  l|x|l_l  )  =  0.0084675  PASSED 

i |Ax-b| i_oo  /  (  eps  *  1 |Ai i_oo  *  1 |x| i_oo  )  =  0.0015126  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2C2  66000  60  12  12  2263.99  8.466e+01 

!|Ax-bM_oo  /  (  eps  *  |  |Ai  |_1  *  N  )  =  0.0207560  PASSED 

l|Ax-bi|_oo  /  (  eps  *  1|A||_1  *  ||xM_l  )  =  0.0084520  PASSED 

I |Ax-bi |_oo  /  (  eps  *  ! iA[ |_oo  *  | |x| |_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2C4  66000  60  12  12  2258.37  8.487e+01 

llAx-bM_oo  /  (  eps  *  liAM^l  *  N  )  =  0.0257147  PASSED 

||Ax“b||_oo  /  {  eps  *  I  |A|  |_1  *  llx||__l  )  =  0.0104713  PASSED 

I |Ax-b| |_oo  /  {  eps  *  t  |A| |_oo  *  i |x| |_oo  )  =  0.0018706  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2R2  66000  60  12  12  2249.42  8.521e+01 
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||Ax-b||_oo  /  {  eps  *  ||A!|_1  *  N  )  =  0.0207560  .  PASSED 

I  iAx-b|  |_oo  /  (  eps  *  I  |A|  [_1  *  I  lx]  |_1  )  =  0.0084520  PASSED 

llAx-b||__oo  /  (  eps  *  i  jAl  t_oo  *  ||xl|_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2R4  66000  60  12  12  2259.19  8.484e+01 

liAx-b|l_oo  /  (  eps  *  I  1AM_1  *  N  )  =  0.0211940  .  PASSED 

llAx-b|l_oo  /  (  eps  *  I  1AM_1  *  l|x||_l  )  =  0.0086304  .  PASSED 

||Ax-bIl_oo  /  (  eps  *  I  |AM__oo  *  ||x||_oo  )  =  0.0015418  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2L2  66000  60  12  12  2243.15  8.545e+01 

l|Ax-b|i_oo  /  (  eps  *  ||A[|_1  *  N  )  =  0.0207560  PASSED 

l|Ax-b||_oo  /  {  eps  *  l|A|l_l  *  l|x||__l  )  =  0.0084520  PASSED 

i  |Ax-bl  |__oo  /  {  eps  *  MAI  |_oo  *  llx||_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2L4  66000  60  12  12  2253.61  8.505e+01 

I  |Ax-bl  |_oo  /  (  eps  *  i  |A|  |_1  *  N  )  ==  0.0207  939  .  PASSED 

1  I  Ax“b  I  I  oo  /  {  eps  *  MA|  1  _1  *  I  !x|  |_1  )  =  0.0084  675  .  PASSED 

l!Ax“b||_oo  /  (  eps  *  1  lAM_oo  *  ||xl|_oo  )  =  0.0015126  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2C2  66000  60  12  12  2243.70  8.543e+01 

l!Ax-bi|_oo  /  (  eps  *  I 1A| 1_1  *  N  )  =  0.0207560  PASSED 

I |Ax“bl l_oo  /  (  eps  *  i |A| |_1  *  I |x| |_1  )  =  0.0084520  PASSED 

I |Ax-bl |_oo  /  (  eps  *  I lA! l_oo  *  \  |xi |_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2C4  66000  60  12  12  2254.58  8.501e+01 

I  1  Ax-b  1  l_oo  /  (  eps  *  I  |A1  i_l  N  )  =  0.0257147  PASSED 

I lAX“bj  l_oo  /  (  eps  *  I |A| |_1  *  1 |x| |_1  )  =  0.0104713  PASSED 

||Ax-b||_oo  /  (  eps  *  i IA| |_oo  *  ||x||_oo  )  =  0.0018706  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2R2  66000  60  12  12  2250.39  8.517e+01 

llAx-bl|_oo  /  (  eps  *  |1A!|_1  *  N  )  =  0.0207560  PASSED 

||Ax-b!l_oo  /  (  eps  *  IIAII^l  *  l|x|l_l  )  =  0.0084520  PASSED 

||Ax-b||_oo  /  (  eps  *  I IA| |_oo  *  |lx||_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2R4  66000  60  12  12  2255.29  8.499e+01 

I  I Ax-b i  1  oo  /  (  eps  *  I  |A|  1^1  *  N  )  —  0.0211940  .  PASSED 
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I lAx-b| |_oo  /  (  eps  *  I |Aj  1_1  *  I |x| i_l  )  =  0.0086304  PASSED 

I [Ax-b| l_oo  /  (  eps  *  1 |A| i _oo  *  | |x| |_oo  )  =  0.0015418  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2L2  66000  60  12  12  2255.07  8.500e+01 

I |Ax-b| |_oo  /  (  eps  *  1 |A| 1 _1  *  N  )  =  0.0207560  PASSED 

MAx-biroo  /  (  eps  *  1|AM_1  *  llx|i_l  >  =  0.0084520  PASSED 

I [Ax”b| l_oo  /  (  eps  *  I |A| |_oo  *  I |x| i_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2L4  66000  60  12  12  2245.12  8.537e+01 

!|Ax-b||_oo  /  (  eps  *  i|AM_l  *  N  )  =  0.0207939  PASSED 

I [Ax-b| |_oo  /  (  eps  *  I |A[ |_1  *  I |xi |_1  )  =  0.0084675  PASSED 

I lAx-b| l_oo  /  (  eps  *  I |A[ l_oo  *  | |x[ |_oo  )  =  0.0015126  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2C2  66000  60  12  12  2248.07  8.526e+01 

I iAx“b| I  oo  /  (  eps  *  i |Aj  |_1  *  N  )  =  0.0207560  PASSED 

IIAx-blToo  /  {  eps  *  I  !A|  |_1  *  llx||_l  )  =  0.0084520  PASSED 

||Ax-bi|__oo  /  {  eps  *  I  |A|  |_oo  *  ||x||_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2C4  66000  60  12  12  2251.32  8.514e+01 

1  |Ax~b|  l_oo  /  (  eps  *  i  |A1  |_1  *  N  )  =  0.0257147  PASSED 

i|Ax-b||_oo  /  (  eps  ^  ilAi|_l  *  llxil^l  )  =  0.0104713  PASSED 

iiAx-b||_oo  /  (  eps  *  j  iA| |_oo  *  ||x||_oo  )  =  0.0018706  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2R2  66000  60  12  12  2252.19  8.510e+01 

||Ax-bM_oo  /  (  eps  *  I  |A|  [_1  *  N  )  =  0.0207560  PASSED 

I lAx-bt  |_oo  /  (  eps  *  I |Ai  t_l  *  |  jxj  |_1  )  =  0.0084520  PASSED 

|jAx-b||_oo  /  (  eps  *  |lA||_oo  *  ||x||_oo  )  =  0.0015099  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2R4  66000  60  12  12  2252.21  8.510e+01 

i [Ax-b| |_oo  /  (  eps  *  I |A| |_1  *  N  )  =  0.0211940  PASSED 

MAx-b||_oo  /  (  eps  *  I 1A| |_1  *  l|x||_l  )  =  0.0086304  PASSED 

I  |Ax~b|  l_oo  /  (  eps  *  I  |A|  l__oo  *  |  |x|  !_oo  )  =  0.0015418  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2L2  66000  64  12  12  2231.95  8.588e+01 

I lAx”b| l_oo  /  (  eps  *  I |A| |_1  *  N  )  =  0.0217185  PASSED 

ilAx-blToo  /  (  eps  *  1|A||_1  *  l|xl|_l  )  =  0.0088440  PASSED 
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I lAx-b| |_oo  /  (  eps  *  I IA| l_oo  *  | |xl |_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2L4  66000  64  12  12  2238.98  8.561e+01 

i |Ax-bi |_oo  /  {  eps  *  [ iA| |_1  *  N  )  =  0.0217791  PASSED 

IIAx-biCoo  /  (  eps  *  l|A||_l  *  lix||_l  )  =  0.0088687  PASSED 

i |Ax-b| [_oo  /  (  eps  *  I |A| l_oo  *  | |x| |_oo  )  =  0.0015843  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2C2  66000  64  12  12  2231.68  8.589e+01 

||Ax-b||_oo  /  {  eps  *  l|A||_l  *  N  )  =  0.0217185  PASSED 

i|Ax-bM_oo  /  {  eps  *  l|A|l_l  *  l|x||_l  )  =  0.0088440  PASSED 

I |Ax-b| l_oo  /  {  eps  *  I |A| l_oo  *  I |x| |_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2C4  66000  64  12  12  2237.69  8.566e+01 

MAx-b!|_oo  /  (  eps  *  I |Aii_l  *  N  >  -  0.0233270  PASSED 

i [Ax-bl l_oo  /  (  eps  *  I |A[ i_l  *  I |x| 1_1  )  =  0.0094990  PASSED 

! |Ax-b| l_oo  /  {  eps  *  1 |Aj  !_oo  *  | |x| i_oo  )  =  0.0016969  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2R2  66000  64  12  12  2240.78  8.554e+01 

I  |Ax'-b|  I  oo  /  (  eps  *  I  |A|  i_l  *  N  )  -  0.0217185  PASSED 

||Ax-b||”oo  /  (  eps  *  I |A| |_1  *  l|xl|_l  )  =  0.0088440  PASSED 

I lAx-bt  l_oo  /  (  eps  *  I |Ai [_oo  *  | Ixj  |_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03L2R4  66000  64  12  12  2235.19  8.575e+01 

IIAx-b||_oo  /  (  eps  *  !|A| |_1  *  N  )  =  0.0209615  PASSED 

||Ax-bi|_oo  /  (  eps  *  I  |A|  |_1  *  ||x||_l  )  =  0.0085357  PASSED 

I iAx-b| l_oo  /  (  eps  *  I |A| |_oo  *  | [x| l_oo  )  =  0.0015248  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2L2  66000  64  12  12  2252.16  8.511e+01 

!  |Ax“b|  |_oo  /  (  eps  *  I  1A|  |_1  N  )  =  0.0217185  .  PASSED 

|lAx-b||_oo  /  (  eps  *  iIA|!_l  *  l|x||_l  )  =  0.0088440  PASSED 

j  lAx-b|  t_oo  /  (  eps  *  I |A| i_oo  *  | |x| l_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2L4  66000  64  12  12  2229.40  8.597e+01 

i  |Ax-b|  l_oo  /  (  eps  *  I  |A|  |_1  N  )  =  0.0217791  PASSED 

lIAx-bM_oo  /  (  eps  *  I  |A|  |_1  *  Mx||_l  )  =  0.0088687  PASSED 

! |Ax-b| l_oo  /  (  eps  *  I |A| |_oo  *  | |x| I_oo  )  =  0.0015843  PASSED 
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T/V  N  NB  P  Q  Time  Gflops 

W03C2C2  66000  64  12  12  2239.24  8.560e+01 

||Ax-b||_oo  /  (  eps  *  MAI  |_1  *  N  )  =  0.0217185  PASSED 

I  I Ax-b I  I  00  /  (  eps  *  I  |A|  1_1  *  I  |x|  |_1  )  =  0.0088440  PASSED 

MAx-bl |_oo  /  (  eps  *  I |A| |_oo  *  ||x||_oo  )  =  0.0015799  PASSED 

T/v  N  NB  P  Q  Time  Gflops 

W03C2C4  66000  64  12  12  2239.84  8.557e+01 

I  I Ax-b I  I  00  /  (  eps  *  I  |A|  I _1  *  N  )  =  0.0233270  PASSED 

MAx-b||~oo  /  (  eps  *  I  |AM_1  *  I  |x|  |_1  )  =  0.0094990  PASSED 

||Ax-bM_oo  /  (  eps  *  I  |A|  |_oo  *  l|x||_oo  )  =  0.0016969  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2R2  66000  64  12  12  2237.44  8.567e+01 

I  I Ax-b I  |_oo  /  (  eps  *  I  |A|  |_1  *  N  )  =  0.0217185  PASSED 

I  I Ax-b I  I  oo  /  (  eps  *  I  |A|  | _1  *  I  |x|  |_1  )  =  0.0088440  PASSED 

||Ax-b||~oo  /  (  eps  *  I |A| |_oo  *  I |x| |_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03C2R4  66000  64  12  12  2234.10  8.579e+01 

1  I Ax-b I  |_oo  /  (  eps  *  I  |A|  |_1  *  N  )  =  0.0209615  PASSED 

||Ax-b||_oo  /  (  eps  *  I |A| |_1  *  ||x||_l  )  =  0.0085357  PASSED 

I |Ax-b| |_oo  /  (  eps  *  I |A| |_oo  *  ||x||_oo  )  =  0.0015248  PASSED 

T/v  N  NB  P  Q  Time  Gflops 

W03R2L2  66000  64  12  12  2233.78  8.581e+01 

I |Ax-b| I  00  /  (  eps  *  I |A| I _1  *  N  )  =  0.0217185  PASSED 

I  I Ax-b I  I  oo  /  (  eps  *  I  |A|  I _1  *  I  |x|  |_1  )  =  0.0088440  PASSED 

||Ax-b||~oo  /  (  eps  *  I |A| |_oo  *  Mx||_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2L4  66000  64  12  12  2230.70  8.592e+01 

I  I Ax-b I  I  oo  /  (  eps  *  I  |A|  I _1  *  N  )  =  0.0217791  PASSED 

I |Ax-b| |~oo  /  (  eps  *  I |A| |_1  *  ||x||_l  )  =  0.0088687  PASSED 

I |Ax-b| |_oo  /  (  eps  *  I |A| l_oo  *  ||x||_oo  )  =  0.0015843  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2C2  66000  64  12  12  2240.57  8.555e+01 

I |Ax-b| |_oo  /  (  eps  *  I |A| I _1  *  N  )  =  0.0217185  PASSED 

I  I Ax-b I  l_oo  /  (  eps  *  MA|  I _1  *  I  |x|  |_1  )  =  0.0088440  PASSED 

1 |Ax-b| |~oo  /  (  eps  *  I |A| |_oo  *  ||x||_oo  )  =  0.0015799  PASSED 
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T/V  N  NB  P  Q  Time  Gflops 

W03R2C4  66000  64  12  12  2234.19  8.579e+01 

[|Ax-b||_oo  /  (  eps  *  1|A||_1  N  )  =  0.0233270  .  PASSED 

t|Ax-b||_oo  /  {  eps  *  I |A| |_1  *  l|x||_l  )  =  0.0094990  PASSED 

MAx^bl |_oo  /  {  eps  *  I |A| |_oo  *  ||x|l_oo  )  =  0.0016969  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2R2  66000  64  12  12  2227.54  8.605e+01 

|jAx-b||_oo  /  (  eps  *  iiA||__l  N  )  =  0.0217185  .  PASSED 

||Ax-b||_oo  /  (  eps  *  !tA||_l  *  l|x||_l  )  =  0.0088440  PASSED 

ltAx-b|j_oo  /  (  eps  *  I 1A| |_oo  *  ||x||_oo  )  =  0.0015799  PASSED 

T/V  N  NB  P  Q  Time  Gflops 

W03R2R4  66000  64  12  12  2239.26  8.560e+01 

l|Ax-bl|_oo  /  (  eps  *  ||A|1_1  *  N  )  =  0.0209615  PASSED 

1  |Ax-bl  |__oo  /  {  eps  *  I  |A|  |_1  *  j|x||_l  )  =  0.0085357  PASSED 

I |Ax~bl |_oo  /  {  eps  *  I |A| |_oo  *  | |x| l_oo  )  =  0.0015248  PASSED 


Finished  36  tests  with  the  following  results: 

36  tests  completed  and  passed  residual  checks^ 

0  tests  completed  and  failed  residual  checks, 

0  tests  skipped  because  of  illegal  input  values. 


End  of  Tests. 


Explanation  of  Benchmark  results: 

All  of  the  output  lines  of  importance  will  look  like  this. 


W03L2L2  66000  60  12  12  2297.85  8.341e+01 


W: 

0: 

3: 

L: 

2: 

L: 

2: 

66000: 

60: 


don  *  t  know 
depth 

type  of  broadcast 

RFACTs  recursive  panel  factorization  (Left,  Grout,  Right) 
NDIV 

PFACTs  panel  factorization  (Left,  Grout,  Right) 

NBMIN  recursion  stops  when  the  current  panel  is  made  of 
less  than  or  equal  to  NBMIN  columns 
The  order  of  the  coefficient  matrix  A 
NB  The  partitioning  blocking  factor 
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12:  P  the  number  of  process  rows 

12:  Q  the  number  of  process  columns 

2297.85:  Time  in  seconds 

8.341e+01:  Processing  speed  in  Gflops 

Here  are  some  links  with  further  information: 

Main  page :  http : / /www . netlib . org/benchmark/hpl/ 

Description  of  the  algorithm: 

http : //www . netlib . org/benchmark/hpl/ algorithm.html 
FAQ:  http : //www . netlib . org/benchmark/hpl/f aqs . html 
Tuning:  http : //www , netlib . org/benchmark/hpl/tuning . html 
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Appendix  C  -  AMD  Proposal 

AMDH 

University  Funding  Request 

Date:  September  12,  2001 

University:  University  of  Central  Florida  (UCF) 

Program  Name:  OPCODE 


Program  Overview: 

Provide  a  brief  description  of  the  program  for  which  you  are  requesting  support. 


The  OPCODE  Cluster  project  is  really  acollaboration  between  several  different  projects  and 
funding  sources.  Chief  among  these  is  the  original  OPCODE  project  funded  by  the  Air  Force 
Office  of  Scientific  Research  (AFOSR)  under  the  Defense/University  Research  Initiative 
Program.  AFOSR  has  provided  approximately  $217,000  in  funds  strictly  for  use  in  equipment 
purchases  to  build  two  large  computing  clusters  at  UCF.  In  addition,  partial  matching  funds 
have  been  provided  by  the  UCF  Office  of  Sponsored  Research  ($27,000),  the  UCF  School  of 
Electrical  Engineering  and  Computer  Science  (SEECS)  ($13,500),  and  the  Institute  for 
Simulation  and  Training  (1ST)  ($13,500).  The  OPCODE  Cluster  project  is  an  outgrowth  of  an 
earlier  cluster  computing  project  jointly  funded  by  the  Army  Simulation  and  Training 
Instrumentation  Command  (STRICOM)  and  UCF,  which  resulted  in  the  construction  of  our 
original  16  node,  32  processor  cluster.  STRICOM’s  Advanced  Tactical  Engagement  Simulation 
Program  and  UCF  are  providing  an  additional  $27,000  for  the  new  OPCODE  project,  above  and 
beyond  funds  cited  previously.  Dr.  Guy  Schiavone  is  Principal  Investigator  these  cluster 
projects,  and  is  also  technical  lead  on  another  UCF  cluster  project  funded  internally  by  UCF 
SEECS  that  has  resulted  in  the  recently  completed  “SCEROLA”  cluster,  employing  128  900 
MHz  Athlon  Thunderbird  processors. 


System: 

Describe  the  cluster  system  you  intend  to  build. 


The  OPCODE  project  will  result  in  the  constraction  of  two  192-processor  computing  clusters, 
one  located  at  1ST  and  the  other  at  SEECS  UCF.  These  clusters  will  employ  a  dual  channel  fast 
Ethernet  network,  linked  together  over  a  high-speed,  folly  stacked  switch.  The  SEECS- 
OPCODE  cluster  will  be  linked  on  the  same  switch  with  the  newly  completed  128-processor 
SEECS  cluster  cited  above.  At  each  node,  we  plan  to  employ  dual  AMD  processors  on  a  Tyan 
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“Tiger”  K7  motherboard,  with  1  GB  of  DDR  RAM,  and  20  Gb  local  disk  storage  per  node.  The 
Linux  operating  system  will  be  used,  and  a  large  number  of  supporting  packages  will  be  installed 
to  support  general  needs  of  research  and  education  in  distributed  and  parallel  computation.  To 
support  video  and  audio  processing  capabilities,  an  alternative  IEEE  1394  network  will  also  be 
available.  We  have  contributed  extensively  to  developing  IEEE  1394  drivers  for  Linux,  and  one 
of  our  patches  is  included  in  the  Linux  2.4.x  kernel. 


Application: 

Why  should  AMD  be  particularly  interested  in  your  proposed  system,  and  what  it  can  offer  to 
industry. 


The  twin  OPCODE  clusters  being  developed  will  be  one  of  the  first  to  employ  dual  AMD 
processors,  and  will  thus  demonstrate  the  power  and  cost  efficiency  of  the  new  AMD 
multiprocessing  capabilities.  Another  novel  aspect  of  these  clusters  is  the  alternative  IEEE  1394 
network  that  will  be  used  for  video  and  audio  processing.  For  example,  use  of  the  1394  network 
is  incorporated  in  a  pending  proposal  to  DARPA,  on  the  topic  of  Voice  Recognition  in 
Backgrovmd  Noise,  a  surveillance  technology  particularly  relevant  in  light  of  recent  terrorist 
attacks  on  our  country.  Another  unique  proposed  application  of  the  cluster  is  in  the  area  of 
Computer  Generated  Force  Simulations  for  the  Army  OneSAF  program.  The  use  of  cluster 
computing  in  this  area  will  result  in  the  generation  of  a  large  number  of  battlefield  entities  with 
more  sophisticated  behaviors  than  previously  possible.  We  plan  an  aggressive  benchmarking 
schedule,  and  fully  expect  these  computers  to  rank  on  the  list  of  Top  500  most  powerful 
computers  in  the  world.  In  addition,  we  are  developing  graduate  and  undergraduate  level  course 
curricula  in  the  area  of  cluster  computing. 


Benefits 

Describe  what  AMD  gains  by  participation.  Why  is  this  a  good  investment? 


The  OPCODE  project  is  a  large,  high-visibility  project  that  will  result  in  a  number  of  unique 
applications  of  cluster  computing.  We  believe  that  our  experience  and  vision  in  the  area  of 
cluster  computing  will  result  in  a  number  of  notable  accomplishments,  and  we  plan  to 
aggressively  publicize  these  accomplishments,  both  in  the  usual  academic  circles,  and  in  the 
private  sector.  Our  clusters  will  serve  as  a  showcase  for  the  success  of  AMD  multiprocessing. 
We  are  strong  proponents  of  AMD  technology,  and  would  welcome  the  opportunity  to  share  the 
credit  with  AMD  for  our  upcoming  successes  in  the  area  of  cluster  computing. 


Permissions 

Will  AMD  have  the  University's  permission  to  post  your  story  on  oior  Web  site,  in  printed 
collateral,  AMD  customer  presentations,  and  in  materials  sent  to  the  media. 
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Yes,  most  definitely. 


Proposal  for  2001  (’01-^02): 

Provide  details  on  funding  recommendation. 


Our  current  budget  is  stretched,  and  will  not  accommodate  the  use  of  the  new  AMD  Athlon  4 
(Palomino)  processors  that  are  recommended  for  multiprocessing  by  AMD.  We  are  requesting  a 
donation  of  192  AMD  Athlon  4  (Palomino)  processors  from  AMD  to  be  used  in  our  two 
OPCODE  clusters.  We  will  purchase  the  remaining  192  processors  using  our  existing  project 
funds. 


References: 

Points-of-Contact:  Dr.  Guy  Schiavone 

Assistant  Professor  of  Computer  Engineering 
School  of  Electrical  and  Computer  Engineering 
University  of  Central  Florida 
(407)882-1300 
guy@ist.ucfedu 
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